Feature Enriched Nonparametric Bayesian Co-clustering
نویسندگان
چکیده
Co-clustering has emerged as an important technique for mining relational data, especially when data are sparse and high-dimensional. Coclustering simultaneously groups the different kinds of objects involved in a relation. Most co-clustering techniques typically only leverage the entries of the given contingency matrix to perform the two-way clustering. As a consequence, they cannot predict the interaction values for new objects. In many applications, though, additional features associated to the objects of interest are available. The Infinite Hidden Relational Model (IHRM) has been proposed to make use of these features. As such, IHRM has the capability to forecast relationships among previously unseen data. The work on IHRM lacks an evaluation of the improvement that can be achieved when leveraging features to make predictions for unseen objects. In this work, we fill this gap and re-interpret IHRM from a co-clustering point of view. We focus on the empirical evaluation of forecasting relationships between previously unseen objects by leveraging object features. The empirical evaluation demonstrates the effectiveness of the feature-enriched approach and identifies the conditions under which the use of features is most useful, i.e., with sparse data.
منابع مشابه
Nonparametric Bayesian Co-clustering Ensembles
A nonparametric Bayesian approach to co-clustering ensembles is presented. Similar to clustering ensembles, coclustering ensembles combine various base co-clustering results to obtain a more robust consensus co-clustering. To avoid pre-specifying the number of co-clusters, we specify independent Dirichlet process priors for the row and column clusters. Thus, the numbers of rowand column-cluster...
متن کاملNonparametric Bayesian Models for Unsupervised Learning
NONPARAMETRIC BAYESIAN MODELS FOR UNSUPERVISED LEARNING Pu Wang, PhD George Mason University, 2011 Dissertation Director: Carlotta Domeniconi Unsupervised learning is an important topic in machine learning. In particular, clustering is an unsupervised learning problem that arises in a variety of applications for data analysis and mining. Unfortunately, clustering is an ill-posed problem and, as...
متن کاملCluster and Feature Modeling from Combinatorial Stochastic Processes
One of the focal points of the modern literature on Bayesian nonparametrics has been the problem of clustering, or partitioning, where each data point is modeled as being associated with one and only one of some collection of groups called clusters or partition blocks. Underlying these Bayesian nonparametric models are a set of interrelated stochastic processes, most notably the Dirichlet proce...
متن کاملMultiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions
We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which sc...
متن کاملA Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views
Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multi-faceted and can be grouped and interpreted in many different ways. Moreover, for high-dimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering inte...
متن کامل